Binary coordinate ascent: An efficient optimization technique for feature subset selection for machine learning
نویسندگان
چکیده
Feature subset selection (FSS) has been an active area of research in machine learning. A number of techniques have been developed for selecting an optimal or sub-optimal subset of features, because it is a major factor to determine the performance of a machine-learning technique. In this paper, we propose and develop a novel optimization technique, namely, a binary coordinate ascent (BCA) algorithm that is an iterative deterministic local optimization that can be coupled with wrapper or filter FSS. The algorithm searches throughout the space of binary coded input variables by iteratively optimizing the objective function in each dimension at a time. We investigated our BCA approach in wrapper-based FSS under area under the receiver-operating-characteristic (ROC) curve (AUC) criterion for the best subset of features in classification. We evaluated our BCA-based FSS in optimization of features for support vector machine, multilayer perceptron, and Naïve Bayes classifiers with 12 datasets. Our experimental datasets are distinct in terms of the number of attributes (ranging from 18 to 11,340), and the number of classes (binary or multi-class classification). The efficiency in terms of the number of subset evaluations was improved substantially (by factors of 5–37) compared with two popular FSS meta-heuristics, i.e., sequential forward selection (SFS) and sequential floating forward selection (SFFS), while the classification performance for unseen data was maintained. © 2016 Elsevier B.V. All rights reserved.
منابع مشابه
A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization
Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...
متن کاملFeature Selection for Document Ranking using Best First Search and Coordinate Ascent
Feature selection is an important problem in machine learning since it helps reduce the number of features a learner has to examine and reduce errors from irrelevant features. Even though feature selection is well studied in the area of classification, this is not the case for ranking algorithms. In this paper, we propose a feature selection technique for ranking based on the wrapper approach u...
متن کاملOnline Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features
Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...
متن کاملFeature Selection in Structural Health Monitoring Big Data Using a Meta-Heuristic Optimization Algorithm
This paper focuses on the processing of structural health monitoring (SHM) big data. Extracted features of a structure are reduced using an optimization algorithm to find a minimal subset of salient features by removing noisy, irrelevant and redundant data. The PSO-Harmony algorithm is introduced for feature selection to enhance the capability of the proposed method for processing the measure...
متن کاملتعیین ماشینهای بردار پشتیبان بهینه در طبقهبندی تصاویر فرا طیفی بر مبنای الگوریتم ژنتیک
Hyper spectral remote sensing imagery, due to its rich source of spectral information provides an efficient tool for ground classifications in complex geographical areas with similar classes. Referring to robustness of Support Vector Machines (SVMs) in high dimensional space, they are efficient tool for classification of hyper spectral imagery. However, there are two optimization issues which s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Knowl.-Based Syst.
دوره 110 شماره
صفحات -
تاریخ انتشار 2016